Maximum Likelihood Estimation ∗

نویسندگان

  • Clayton Scott
  • Robert Nowak
چکیده

This module introduces the maximum likelihood estimator. We show how the MLE implements the likelihood principle. Methods for computing th MLE are covered. Properties of the MLE are discussed including asymptotic e ciency and invariance under reparameterization. The maximum likelihood estimator (MLE) is an alternative to the minimum variance unbiased estimator (MVUE). For many estimation problems, the MVUE does not exist. Moreover, when it does exist, there is no systematic procedure for nding it. In constrast, the MLE does not necessarily satisfy any optimality criterion, but it can almost always be computed, either through exact formulas or numerical techniques. For this reason, the MLE is one of the most common estimation procedures used in practice. The MLE is an important type of estimator for the following reasons: 1. The MLE implements the likelihood principle. 2. MLEs are often simple and easy to compute. 3. MLEs have asymptotic optimality properties (consistency and e ciency). 4. MLEs are invariant under reparameterization. 5. If an e cient estimator exists, it is the MLE. 6. In signal detection with unknown parameters (composite hypothesis testing), MLEs are used in implementing the generalized likelihood ratio test (GLRT). This module will discuss these properties in detail, with examples. 1 The Likelihood Principle Supposed the data X is distributed according to the density or mass function p (x |θ). The likelihood function for θ is de ned by l (θ |x) ≡ p (x |θ) At rst glance, the likelihood function is nothing new it is simply a way of rewriting the pdf/pmf of X. The di erence between the likelihood and the pdf or pmf is what is held xed and what is allowed to vary. When we talk about the likelihood, we view the observation x as being xed, and the parameter θ as freely varying. ∗Version 1.5: May 12, 2004 11:50 am GMT-5 †http://creativecommons.org/licenses/by/1.0 http://cnx.org/content/m11446/1.5/ Connexions module: m11446 2 note: It is tempting to view the likelihood function as a probability density for θ, and to think of l (θ |x) as the conditional density of θ given x. This approach to parameter estimation is called ducial inference, and is not accepted by most statisticians. One potential problem, for example, is that in many cases l (θ |x) is not integrable ( ∫ l (θ |x) dθ →∞) and thus cannot be normalized. A more fundamental problem is that θ is viewed as a xed quantity, as opposed to random. Thus, it doesn't make sense to talk about its density. For the likelihood to be properly thought of as a density, a Bayesian approach is required. The likelihood principle e ectively states that all information we have about the unknown parameter θ is contained in the likelihood function. Rule 1: Likelihood Principle The information brought by an observation x about θ is entirely contained in the likelihood function p (x |θ). Moreover, if x1 and x2 are two observations depending on the same parameter θ, such that there exists a constant c satisfying p (x1 |θ) = cp (x2 |θ) for every θ, then they bring the same information about θ and must lead to identical estimators. In the statement of the likelihood principle, it is not assumed that the two observations x1 and x2 are generated according to the same model, as long as the model is parameterized by θ. Example 1 Suppose a public health o cial conducts a survey to estimate 0 ≤ θ ≤ 1, the percentage of the population eating pizza at least once per week. As a result, the o cial found nine people who had eaten pizza in the last week, and three who had not. If no additional information is available regarding how the survey was implemented, then there are at least two probability models we can adopt. 1. The o cial surveyed 12 people, and 9 of them had eaten pizza in the last week. In this case, we observe x1 = 9, where x1 ∼ Binomial (12, θ) The density for x1 is f (x1 |θ) =  12 x1  θ1(1− θ)1 2. Another reasonable model is to assume that the o cial surveyed people until he found 3 non-pizza eaters. In this case, we observe x2 = 12, where x2 ∼ NegativeBinomial (3, 1− θ) The density for x2 is g (x2 |θ) =  x2 − 1 3− 1  θx2−3(1− θ) The likelihoods for these two models are proportional: ( ` (θ |x1) ∝ ` (θ |x2) ∝ θ(1− θ) ) Therefore, any estimator that adheres to the likelihood principle will produce the same estimate for θ, regardless of which of the two data-generation models is assumed. The likelihood principle is widely accepted among statisticians. In the context of parameter estimation, any reasonable estimator should conform to the likelihood principle. As we will see, the maximum likelihood estimator does. http://cnx.org/content/m11446/1.5/ Connexions module: m11446 3 note: While the likelihood principle itself is a fairly reasonable assumption, it can also be derived from two somewhat more intuitive assumptions known as the su ciency principle and the conditionality principle. See Casella and Berger, Chapter 6[1]. 2 The Maximum Likelihood Estimator The maximum likelihood estimatorθ̂ (x) is de ned by θ̂ = argmax θ l (θ |x) Intuitively, we are choosing θ to maximize the probability of occurrence of the observation x. note: It is possible that multiple parameter values maximize the likelihood for a given x. In that case, any of these maximizers can be selected as the MLE. It is also possible that the likelihood may be unbounded, in which case the MLE does not exist. The MLE rule is an implementation of the likelihood principle. If we have two observations whose likelihoods are proportional (they di er by a constant that does not depend on θ), then the value of θ that maximizes one likelihood will also maximize the other. In other words, both likelihood functions lead to the same inference about θ, as required by the likelihood principle. Understand that maximum likelihood is a procedure, not an optimality criterion. From the de nition of the MLE, we have no idea how close it comes to the true parameter value relative to other estimators. In constrast, the MVUE is de ned as the estimator that satis es a certain optimality criterion. However, unlike the MLE, we have no clear produre to follow to compute the MVUE. 3 Computing the MLE If the likelihood function is di erentiable, then θ̂ is found by di erentiating the likelihood (or log-likelihood), equating with zero, and solving: ∂ ∂θ (log (l (θ |x))) = 0 If multiple solutions exist, then the MLE is the solution that maximizes log (l (θ |x)), that is, the global maximizer. In certain cases, such as pdfs or pmfs with an esponential form, the MLE can be easily solved for. That is, ∂ ∂θ (log (l (θ |x))) = 0 can be solved using calculus and standard linear algebra. Example 2: DC level in white Guassian noise Suppose we observe an unknown amplitude in white Gaussian noise with unknown variance: xn = A+ wn n ∈ {0, 1, . . . , N − 1}, where wn ∼ [U+EF3B] ( 0, σ ) are independent and identically distributed. We would like to estimate θ =  A σ  by computing the MLE. Di erentiating the log-likelihood gives ∂ ∂A (log (p (x |θ))) = 1 σ2 N ∑

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Development of Maximum Likelihood Estimation Approaches for Adaptive Estimation of Free Speed and Critical Density in Vehicle Freeways

The performance of many traffic control strategies depends on how much the traffic flow models have been accurately calibrated. One of the most applicable traffic flow model in traffic control and management is LWR or METANET model. Practically, key parameters in LWR model, including free flow speed and critical density, are parameterized using flow and speed measurements gathered by inductive ...

متن کامل

Maximum Likelihood Estimation of Parameters in Generalized Functional Linear Model

Sometimes, in practice, data are a function of another variable, which is called functional data. If the scalar response variable is categorical or discrete, and the covariates are functional, then a generalized functional linear model is used to analyze this type of data. In this paper, a truncated generalized functional linear model is studied and a maximum likelihood approach is used to esti...

متن کامل

Change Point Estimation of the Stationary State in Auto Regressive Moving Average Models, Using Maximum Likelihood Estimation and Singular Value Decomposition-based Filtering

In this paper, for the first time, the subject of change point estimation has been utilized in the stationary state of auto regressive moving average (ARMA) (1, 1). In the monitoring phase, in case the features of the question pursue a time series, i.e., ARMA(1,1), on the basis of the maximum likelihood technique, an approach will be developed for the estimation of the stationary state’s change...

متن کامل

Bearing Fault Detection Based on Maximum Likelihood Estimation and Optimized ANN Using the Bees Algorithm

Rotating machinery is the most common machinery in industry. The root of the faults in rotating machinery is often faulty rolling element bearings. This paper presents a technique using optimized artificial neural network by the Bees Algorithm for automated diagnosis of localized faults in rolling element bearings. The inputs of this technique are a number of features (maximum likelihood estima...

متن کامل

A comparison of algorithms for maximum likelihood estimation of Spatial GLM models

In spatial generalized linear mixed models, spatial correlation is assumed by adding normal latent variables to the model. In these models because of the non-Gaussian spatial response and the presence of latent variables the likelihood function cannot usually be given in a closed form, thus the maximum likelihood approach is very challenging. The main purpose of this paper is to introduce two n...

متن کامل

Windowing Effects of Short Time Fourier Transform on Wideband Array Signal Processing Using Maximum Likelihood Estimation

During the last two decades, Maximum Likelihood estimation (ML) has been used to determine Direction Of Arrival (DOA) and signals propagated by the sources, using narrowband array signals. The algorithm fails in the case of wideband signals. As an attempt by the present study to overcome the problem, the array outputs are transformed into narrowband frequency bins, using short time Fourier tran...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004